Add KV cache quantization types #114

ryan-the-crayon · 2024-11-06T16:59:51Z

Adds KV cache quantization configuration

yagil · 2025-01-15T20:18:25Z

packages/lms-shared-types/src/llm/LLMLlamaCacheQuantizationType.ts

@@ -0,0 +1,14 @@
+import { z } from "zod";
+


since we'll (probably?) have MLX types here too, let's drop Llama from the file name. This is the pattern we use in LLMLoadModelConfig.ts. Btw, shouldn't it just live in that file?

I deleted this file, but kept llama in the variable names since the MLX KV cache quantization implementation requires a different type.

In variable name 👍

yagil · 2025-01-15T20:19:11Z

packages/lms-shared-types/src/index.ts

+  LLMLlamaCacheQuantizationType,
+  llmLlamaCacheQuantizationTypes,
+  llmLlamaCacheQuantizationTypeSchema,
+} from "./llm/LLMLlamaCacheQuantizationType";


Suggested change

} from "./llm/LLMLlamaCacheQuantizationType";

} from "./llm/LLMLlamaCacheQuantizationType.js";

but probably should move to LLMLoadModelConfig anyway

I moved these to LLMLoadModelConfig

ryan-the-crayon requested a review from yagil November 6, 2024 16:59

neilmehta24 force-pushed the ryan/add-kv-cache-quantization-options branch from 3486867 to aa568e0 Compare January 15, 2025 20:02

yagil reviewed Jan 15, 2025

View reviewed changes

neilmehta24 requested a review from yagil January 15, 2025 21:23

neilmehta24 force-pushed the ryan/add-kv-cache-quantization-options branch from f9009cf to 6f1e8f5 Compare January 15, 2025 21:27

ryan-the-crayon and others added 2 commits January 15, 2025 16:36

Add KV cache quantization types

195cb27

Code review comments

13c4a57

neilmehta24 force-pushed the ryan/add-kv-cache-quantization-options branch from 6f1e8f5 to 13c4a57 Compare January 15, 2025 21:58

yagil approved these changes Jan 15, 2025

View reviewed changes

neilmehta24 merged commit d9f72c0 into main Jan 15, 2025

neilmehta24 deleted the ryan/add-kv-cache-quantization-options branch January 15, 2025 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add KV cache quantization types #114

Add KV cache quantization types #114

ryan-the-crayon commented Nov 6, 2024

yagil Jan 15, 2025

neilmehta24 Jan 15, 2025

yagil Jan 15, 2025

yagil Jan 15, 2025 •

edited

Loading

neilmehta24 Jan 15, 2025

	} from "./llm/LLMLlamaCacheQuantizationType";
	} from "./llm/LLMLlamaCacheQuantizationType.js";

Add KV cache quantization types #114

Add KV cache quantization types #114

Conversation

ryan-the-crayon commented Nov 6, 2024

yagil Jan 15, 2025

Choose a reason for hiding this comment

neilmehta24 Jan 15, 2025

Choose a reason for hiding this comment

yagil Jan 15, 2025

Choose a reason for hiding this comment

yagil Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

neilmehta24 Jan 15, 2025

Choose a reason for hiding this comment

yagil Jan 15, 2025 •

edited

Loading